import pandas as pd
data1 = {
'A': [1,2,2,3,3],
'B': [4,5,5,6,7]
}
df1 = pd.DataFrame(data1)
df1| A | B | |
|---|---|---|
| 0 | 1 | 4 |
| 1 | 2 | 5 |
| 2 | 2 | 5 |
| 3 | 3 | 6 |
| 4 | 3 | 7 |
Mohammed Adil Siraju
September 19, 2025
Welcome to this tutorial on data cleaning using Pandas! Data cleaning is a crucial step in any data analysis workflow. In this notebook, we’ll cover two essential techniques: - Handling duplicates: Removing or managing repeated rows. - Detecting and removing outliers: Using statistical methods like IQR (Interquartile Range).
By the end, you’ll have practical skills to preprocess messy datasets effectively.
First, let’s import Pandas and create a sample DataFrame to work with.
Duplicates can skew your analysis. Pandas provides easy methods to detect and remove them.
Outliers are extreme values that can distort statistical analysis. We’ll use the IQR method to detect and filter them.
| A | B | |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 2 | 5 |
| 2 | 2 | 5 |
| 3 | 3 | 6 |
| 4 | 11 | 12 |
| 5 | 11 | 25 |